Setting Up

Functions

First, we load some re-usable functions for our activity. You can inspect, edit, and add to the functions by opening the relevant .R script. But for now, we can just source them.

source("R//functions//functions.R")

Let’s now load the necessary libraries for this activity:

library(sf)
library(ggplot2)
library(tmap)
library(dplyr)
library(scales)
library(leaflet)
library(rnaturalearth)
library(rnaturalearthdata)
library(htmlwidgets)

Acquisition and Ingestion

Downloading and Loading Data

Now, let’s start by using our file_grabber() function to download some data from the London Data Store.

# Download and unzip our chosen shapefiles from the London Data Store
file_grabber(file_name_dl = "statistical-gis-boundaries-london.zip", 
             file_name_final = "MSOA_2011_London_gen_MHW.shp", 
              file_path = "/Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/data/", 
             url = "https://data.london.gov.uk/download/statistical-gis-boundary-files-london/9ba8c833-6370-4b11-abdc-314aa020d5e0",
             compressed = TRUE)
## [1] "File already downloaded."
## [1] "File unzipped."

Now that we have downloaded the data, let’s load it into R using the sf package. Here will we not focus on functional programming, so that we can inspect our objects manually and learn about them.

When handling spatial data, we need to be especially attuned to the coordinate reference system (CRS) of the data. This is because the CRS determines how the data, which are actually in three dimensions, is projected onto a 2D plane. If you just work with one shapefile the CRS isn’t so important (though it will change what any map you render looks like), but the moment you combine multiple spatial data sources, you need to make sure they are in the same CRS or you can accidentally end up with data that is misaligned.

# Read the shapefile
shapefile_name <- "MSOA_2011_London_gen_MHW.shp"
london_msoa_2011 <- st_read(paste0("data/shapefiles/statistical-gis-boundaries-london/ESRI/",shapefile_name))
## Reading layer `MSOA_2011_London_gen_MHW' from data source 
##   `/Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/data/shapefiles/statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 983 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
## Projected CRS: OSGB36 / British National Grid
# Look at the data -- note, it doesn't look like a regular tibble or data frame, there is other info there. 
london_msoa_2011
## Simple feature collection with 983 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
##     MSOA11CD                 MSOA11NM   LAD11CD              LAD11NM   RGN11CD
## 1  E02000001       City of London 001 E09000001       City of London E12000007
## 2  E02000002 Barking and Dagenham 001 E09000002 Barking and Dagenham E12000007
## 3  E02000003 Barking and Dagenham 002 E09000002 Barking and Dagenham E12000007
## 4  E02000004 Barking and Dagenham 003 E09000002 Barking and Dagenham E12000007
## 5  E02000005 Barking and Dagenham 004 E09000002 Barking and Dagenham E12000007
## 6  E02000007 Barking and Dagenham 006 E09000002 Barking and Dagenham E12000007
## 7  E02000008 Barking and Dagenham 007 E09000002 Barking and Dagenham E12000007
## 8  E02000009 Barking and Dagenham 008 E09000002 Barking and Dagenham E12000007
## 9  E02000010 Barking and Dagenham 009 E09000002 Barking and Dagenham E12000007
## 10 E02000011 Barking and Dagenham 010 E09000002 Barking and Dagenham E12000007
##    RGN11NM USUALRES HHOLDRES COMESTRES POPDEN HHOLDS AVHHOLDSZ
## 1   London     7375     7187       188   25.5   4385       1.6
## 2   London     6775     6724        51   31.3   2713       2.5
## 3   London    10045    10033        12   46.9   3834       2.6
## 4   London     6182     5937       245   24.8   2318       2.6
## 5   London     8562     8562         0   72.1   3183       2.7
## 6   London     8791     8672       119   50.6   3441       2.5
## 7   London    11569    11564         5   81.5   4591       2.5
## 8   London     8395     8376        19   87.4   3212       2.6
## 9   London     8615     8615         0   76.8   3292       2.6
## 10  London     6187     6086       101   38.8   2289       2.7
##                          geometry
## 1  MULTIPOLYGON (((531667.6 18...
## 2  MULTIPOLYGON (((548881.6 19...
## 3  MULTIPOLYGON (((549102.4 18...
## 4  MULTIPOLYGON (((551550 1873...
## 5  MULTIPOLYGON (((549099.6 18...
## 6  MULTIPOLYGON (((549819.9 18...
## 7  MULTIPOLYGON (((548171.4 18...
## 8  MULTIPOLYGON (((546855 1863...
## 9  MULTIPOLYGON (((549618.8 18...
## 10 MULTIPOLYGON (((550244.1 18...
# It turns out that some (just 4) of the polygons are not valid (i.e., they don't join up properly).
sum(!st_is_valid(london_msoa_2011)) # check if any issues? 4 exist
## [1] 4
london_msoa_2011 <- st_make_valid(london_msoa_2011) # correct them
sum(!st_is_valid(london_msoa_2011)) # check again?
## [1] 0
# Make a very boring map - behold, this is London!
plot(london_msoa_2011$geometry)

# Set a global variable that will be our project CRS, based on the London shapefile crs
proj_crs <- st_crs(london_msoa_2011)
proj_crs$input
## [1] "OSGB36 / British National Grid"

Mapping: Fundamentals

Mapping MSOA – Choropleths

Let’s start by making a very simple map of population density by MSOA. For this, a choropleth map is a good choice as the variable. That is because the variable we are going to map is scaled by area and thus invariant to the visual area of the unit we are mapping. We’ll do this two ways, first using ggplot2, and then using tmap.

# Using ggplot2 -- note, ggplot2 maps are by default quite visually noisy, so we remove a lot of that using theme_void()
choropleth_map_gg <- 
  ggplot() + 
    geom_sf(data = london_msoa_2011, aes(fill = POPDEN), colour = "gray30") + # Choropleth map with POPDEN variable
    ggtitle("Population Density in London MSOAs") + 
    scale_fill_viridis_c(name = "Population Density", labels = scales::comma) + # Choose a viridis colour scale
    theme_void() + # Use theme_void() for minimalist style
    theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) + # Reduce margins a bit
    labs(fill = "Population Density") # Customize legend title

choropleth_map_gg

# We can easily save ggplot map output:
ggsave("output/choropleth_map_gg.pdf", choropleth_map_gg, width = 9, height = 6)

Now let’s use tmap:

# The same map using tmap, which defaults to a much cleaner presentation, so we don't have to do too much (just switch off the frame):
choropleth_map_tm <- 
  tm_shape(london_msoa_2011) +
    tm_fill(col = "POPDEN", title = "Population Density", style = "cont", palette = "viridis") +
    tm_borders(col = "gray30") +
    tm_layout(title = "Population Density in London MSOAs", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "cont"`, use fill.scale =
## `tm_scale_continuous()`.
## ℹ Migrate the argument(s) 'palette' (rename to 'values') to
##   'tm_scale_continuous(<HERE>)'
## [v3->v4] `tm_polygons()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
choropleth_map_tm

# We can also easily save ggplot map output:
tmap_save(choropleth_map_tm, "output/choropleth_map_tm.pdf", width = 9, height = 6)
## Map saved to /Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/output/choropleth_map_tm.pdf
## Size: 9 by 6 inches

Mapping MSOA – Scaled Points

Choropleth maps are good for area-invariant metrics, but what if we want to map something like absolute population (not density)? Let’s quickly see what happens…

tm_shape(london_msoa_2011) +
  tm_fill(col = "USUALRES", title = "Usual Residents", style = "cont", palette = "viridis") +
  tm_borders(col = "gray30") +
  tm_layout(title = "Usual Residents in London MSOAs", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "cont"`, use fill.scale =
## `tm_scale_continuous()`.
## ℹ Migrate the argument(s) 'palette' (rename to 'values') to
##   'tm_scale_continuous(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`

This becomes visually confusing – the colors (which give us an absolute unscaled metric) don’t make sense, given what we know (or at least expect) about London’s population distribution. What’s going on? Well, the magnitude of the variable is not invariant to the area of the unit we are mapping. Central parts of London will have very high population counts, but also very small areas.

An alternative approach would be scaled points. This is a good choice when you want to show the absolute magnitude of a variable, but you don’t want the visual area of the unit to distort the visual representation.

# First, pick out the centroids of the MSOAs, creating a new object
centroids <- st_centroid(london_msoa_2011)
## Warning: st_centroid assumes attributes are constant over geometries
# Second, plot both the polygon (for reference) and the points, scaled to USUALRES. Here, with ggplot:
scaled_points_map_gg <- 
  ggplot() + 
    geom_sf(data = london_msoa_2011, fill = "transparent", color = "gray30", alpha = 0.7) + # Outline of MSOAs for reference
    geom_sf(data = centroids, aes(size = USUALRES), color = "blue", alpha = 0.5) + # Points at centroids, size scaled to USUALRES by aes()
    scale_size_continuous(name = "Usual Residents", range = c(0.1, 3), breaks = scales::breaks_pretty(n = 5)) + # Use range to set lower and upper limits of point size
    ggtitle("Population by MSOA in London") + 
    theme_void() + # Use theme_void() for minimalist style
    theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm"))  # Reduce margins a bit

scaled_points_map_gg

ggsave("output/scaled_points_map_gg.pdf", scaled_points_map_gg, width = 9, height = 6)

scaled_points_map_tm <- 
  tm_shape(london_msoa_2011) + # Layer 1 is the msoa polygon
    tm_borders(col = "gray30") + # Outline of MSOAs
  tm_shape(centroids) + # Layer 2 is the centroids. Note how this works differently to ggplot. We go tm_shap() for the shapefile, then add grammar below.
  tm_bubbles(size = "USUALRES", col = "blue", style = "pretty", alpha = 0.5, title.size = "Usual Residents", scale = 0.6) + # Points at centroids, size scaled to USUALRES
  tm_layout(title = "Population by MSOA in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)  # Set map title
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_bubbles()`: instead of `style = "pretty"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_bubbles()`: use `fill_alpha` instead of `alpha`.
## [v3->v4] `tm_tm_bubbles()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
##   a list: 'size.scale = list(<scale1>, <scale2>, ...)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
scaled_points_map_tm

tmap_save(scaled_points_map_tm, "output/scaled_points_map_tm.pdf", width = 9, height = 6)
## Map saved to /Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/output/scaled_points_map_tm.pdf
## Size: 9 by 6 inches

Applied Example: Rapid Charging Points

Acquisition, Ingestion, Transformation for Rapid Chargers

# Download the data from the London Datastore. We can re-use our function:
file_grabber(file_name_dl = "Rapid_charging_points.gpkg", # this is a .gpkg geo-package, not a .shp shapefile
             file_path = "data/", 
             url = "https://data.london.gov.uk/download/electric_vehicle_charging_site/8ef9c743-c01d-4329-8239-8f858ff4de53",
             compressed = FALSE)
## [1] "File already downloaded."
## [1] "File already unzipped or no need for unzipping."
# Ingest rapid charging points data
rapid_chargers <- st_read("data/Rapid_charging_points.gpkg")
## Reading layer `Rapid_charging_points' from data source 
##   `/Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/data/Rapid_charging_points.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 156 features and 12 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 505012.1 ymin: 157855.5 xmax: 555472.3 ymax: 199091.7
## Projected CRS: unnamed
rapid_chargers
## Simple feature collection with 156 features and 12 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 505012.1 ymin: 157855.5 xmax: 555472.3 ymax: 199091.7
## Projected CRS: unnamed
## First 10 features:
##                 borough            easting  latitude longtitude
## 1              Richmond  520008.8994827288 51.464253  -0.273817
## 2                Newham  543605.6261034013 51.523421   0.068574
## 3                 Brent 524592.31514180696 51.546586  -0.204599
## 4             Redbridge 540514.75320855307 51.591184   0.027028
## 5             Greenwich 540531.48913699761 51.476538   0.022182
## 6              Haringey 530017.97962389176 51.579178  -0.125007
## 7  Hammersmith & Fulham  522999.1943170313 51.519975  -0.228609
## 8              Richmond 520271.88623687194 51.464093  -0.270037
## 9           Westminster 526373.98381476535 51.511661  -0.180296
## 10          Westminster 526115.96909061982 51.523033  -0.183562
##              northing numberrcpoints objectid siteid
## 1  175332.84072642814              1        1      2
## 2  182528.53009116615              2        2     12
## 3  184604.23447573709              1        3     63
## 4  189983.24549439573              1       35    558
## 5  177225.12555734196              1        4     74
## 6  188367.05681860622              2        5     78
## 7  181604.36517970893              3        6     87
## 8  175321.24338926666              1        7    129
## 9  180762.05435913772              1        8    169
## 10 182021.04071718361              1        9    174
##                                                sitename taxipublicuses
## 1                  Upper Richmond Road west of Coval Rd     Public use
## 2                           Claps Gate Lane Retail park     Public use
## 3      1 Christchurch Avenue, Kilburn Station, NW6 7QP            Taxi
## 4                       South Woodford Station Car Park           Taxi
## 5      Old Dover Road west of Dornberg Close - car park     Public use
## 6                    Crouch Hall Rd opp Bryanstone Road     Public use
## 7                     Scrubbs Lane - Wood Lane Car Park     Public use
## 8  A205 Upper Richmond Road West east of Carlton Road.      Public use
## 9                                        Lancaster Gate           Taxi
## 10                     Warwick Avenue by Clifton Villas           Taxi
##                                                                      url
## 1  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 2  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 3  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 4  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 5  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 6  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 7  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 8  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 9  https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 10 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
##       runtime                      geom
## 1  08/29/2019 POINT (520008.9 175332.8)
## 2  08/29/2019 POINT (543605.6 182528.5)
## 3  08/29/2019 POINT (524592.3 184604.2)
## 4  08/29/2019 POINT (540514.7 189983.2)
## 5  08/29/2019 POINT (540531.5 177225.1)
## 6  08/29/2019   POINT (530017.9 188367)
## 7  08/29/2019 POINT (522999.2 181604.3)
## 8  08/29/2019 POINT (520271.9 175321.2)
## 9  08/29/2019   POINT (526373.9 180762)
## 10 08/29/2019   POINT (526115.9 182021)

Now let’s transform and manipulate our data:

# Set the CRS of the main polygon shapefile (we do not re-project here)
rapid_chargers <- rapid_chargers |>
  st_set_crs(proj_crs) 
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
# For interest, what if we want to find out which MSOA each charger is in? 
# Create a new object that conducts a spatial join the rapid chargers data to the shapefile by MSOA name. We use join = st_intersects. 
rapid_chargers_MSOA <- st_join(rapid_chargers, london_msoa_2011, join = st_intersects)

# Check that we haven't lost or duped rows -- don't skip this step!
nrow(rapid_chargers_MSOA) == nrow(rapid_chargers) 
## [1] TRUE
# Note: What if we joined the other way? This is the Bad Place!
bad_join <- st_join(london_msoa_2011, rapid_chargers, join = st_intersects)
nrow(bad_join) == nrow(london_msoa_2011)
## [1] FALSE
# What if instead we wanted to count the number of chargers in each MSOA? And then create a binary indicator for presence of at least one?
london_msoa_2011_counts <- london_msoa_2011 |> 
  transform(charger_count = lengths(st_intersects(london_msoa_2011, rapid_chargers))) |>
  transform(has_charger = ifelse(charger_count>0,"Yes","No"))

# Check the join:
nrow(london_msoa_2011_counts) == nrow(london_msoa_2011) 
## [1] TRUE

Visualisation of Rapid Chargers

# Plot charging stations and London boundaries together using ggplot2, diff colours by number of charge points
rapid_chargers_gg <- 
  ggplot() + 
    geom_sf(data = london_msoa_2011, fill = "transparent", color = "gray30") + # Outline of MSOAs for reference
    geom_sf(data = rapid_chargers, aes(color = numberrcpoints), size = 3, alpha = .8) +
    ggtitle("Rapid Charging Stations in London") + 
    theme_void() + # Use theme_void() for minimalist style
    theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) +  # Reduce margins a bit
    labs(color = "Number of Charge Points")  # Relabel the legend

rapid_chargers_gg

# Plot charging stations and London boundaries together using tmap, diff colours by number of charge points
rapid_chargers_tm <- 
  tm_shape(london_msoa_2011) +
    tm_borders(col = "gray30") + # Outline of MSOAs
  tm_shape(rapid_chargers) +
    tm_dots(col = "numberrcpoints", title = "Number of Charge Points", scale = 4, alpha = .8) + # Adjust dot size
  tm_layout(title = "Rapid Charging Stations in London", frame = FALSE) # Set map title
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.
## [v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
##   a list: 'size.scale = list(<scale1>, <scale2>, ...)'
## [tm_dots()] Argument `title` unknown.
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
rapid_chargers_tm

# Choropleth plot where MSOAs with at least 1 charger are highlighted:
rapid_chargers_choro_gg <- 
  ggplot() + 
  geom_sf(data = london_msoa_2011_counts, aes(fill = has_charger), color = "gray30") +
  theme_void() + # Use theme_void() for minimalist style
  theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) + # Reduce margins a bit
  labs(fill = "Charger Present?") + # Customize legend title
  ggtitle("Rapid Charging Coverage in London")

rapid_chargers_choro_gg

# And again, in tmap:
rapid_chargers_choro_tm <- 
  tm_shape(london_msoa_2011_counts) +
  tm_borders(col = "gray30") +
  tm_fill(col = "has_charger", title = "Charger Present?") +
  tm_layout(title = "Rapid Charging Coverage in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)  # Set map title
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
rapid_chargers_choro_tm

Applied Example: Schools Data

Let’s try a more challenging case: a dataset of points that is not yet a spatial object (just a .csv).

Acquisition and Ingestion - Schools

file_grabber(file_name_dl = "all_schools_xy_2016.csv", 
             file_path = "data/", 
             url = "https://data.london.gov.uk/download/london-schools-atlas/57046151-39a0-45d9-8dc0-27ea7fd02de8",
             compressed = FALSE)
## [1] "File already downloaded."
## [1] "File already unzipped or no need for unzipping."
# Load schools data
schools <- read.csv("data/all_schools_xy_2016.csv")

We have to be a little careful here, as the chosen project crs (declared above) happens to convert our x and y into metres, rather than using lat/long. We have to take two steps here – first, we identify coordinates in our data frame and apply a CRS (‘4326’), making this an sf object. We then project it into the projected CRS. We didn’t have to do this two-step procedure with rapid chargers as that was already a spatial object.

schools_sf <- schools |>
  st_as_sf(coords = c("x", "y"), crs = 4326) |>
  st_transform(proj_crs)

schools_sf
## Simple feature collection with 3889 features and 22 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 499101.9 ymin: 151893 xmax: 564318.2 ymax: 205041.7
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
##    OBJECTID    URN                                         SCHOOL_NAM
## 1         1 135155                        Ayesha Siddiqa Girls School
## 2         2 140492                                 Beis Medrash Elyon
## 3         3 141411                    Big Creative Independent School
## 4         4 142336                             Wetherby Senior School
## 5         5 100042 St Mary's Kilburn Church of England Primary School
## 6         6 100224                         De Beauvoir Primary School
## 7         7 100235                        Queensbridge Primary School
## 8         8 100239                        Princess May Primary School
## 9         9 100263      Holy Trinity Church of England Primary School
## 10       10 100483                         Marlborough Primary School
##                        TYPE          PHASE                  ADDRESS        TOWN
## 1  Other Independent School Not applicable     165-169 The Broadway    Southall
## 2  Other Independent School Not applicable 233 West Hendon Broadway      London
## 3  Other Independent School Not applicable       Silver Birch House Walthamstow
## 4  Other Independent School Not applicable      100 Marylebone Lane      London
## 5    Voluntary Aided School        Primary                Quex Road      London
## 6          Community School        Primary        80 Tottenham Road      London
## 7          Community School        Primary        Queensbridge Road      London
## 8          Community School        Primary        Princess May Road      London
## 9    Voluntary Aided School        Primary            Richmond Road      London
## 10         Community School        Primary          Draycott Avenue      London
##    POSTCODE STATUS GENDER EASTING NORTHING               WARD_NAME
## 1   UB1 1LR   Open  Girls  521263   180470       Southall Broadway
## 2   NW9 7DG   Open   Boys  521939   188148             West Hendon
## 3   E17 5SD   Open  Mixed  535764   190188             Higham Hill
## 4   W1U 2QB   Open   Boys  528432   181474  Marylebone High Street
## 5   NW6 4PG   Open  Mixed  525453   183984                 Kilburn
## 6    N1 4BS   Open  Mixed  533252   184710             De Beauvoir
## 7    E8 3ND   Open  Mixed  533902   184050            Queensbridge
## 8   N16 8AJ   Open  Mixed  533512   185478 Stoke Newington Central
## 9    E8 3DY   Open  Mixed  533659   184613            Queensbridge
## 10  SW3 3AP   Open  Mixed  527415   178681               Hans Town
##                      LSOA_NAME                LA_NAME
## 1                  Ealing 026C                 Ealing
## 2                  Barnet 036F                 Barnet
## 3          Waltham Forest 014C         Waltham Forest
## 4             Westminster 011B            Westminster
## 5                  Camden 020C                 Camden
## 6                 Hackney 021E                Hackney
## 7                 Hackney 020E                Hackney
## 8                 Hackney 014F                Hackney
## 9                 Hackney 021I                Hackney
## 10 Kensington and Chelsea 014C Kensington and Chelsea
##                                    WEBLINK     AGE     map_icon NEW_URN OLD_URN
## 1                                           19-Nov                   NA      NA
## 2                                           16-Nov                   NA      NA
## 3                                          15 - 16                   NA      NA
## 4                                           16-Nov                   NA      NA
## 5  http://www.stmarykilburn.camden.sch.uk/  11-Mar    VOLUNTARY      NA      NA
## 6           www.debeauvoir.hackney.sch.uk/  11-Mar STATE-FUNDED      NA      NA
## 7         www.queensbridge.hackney.sch.uk/  11-Mar STATE-FUNDED      NA      NA
## 8          www.princessmay.hackney.sch.uk/  11-Mar STATE-FUNDED      NA      NA
## 9          www.holytrinity.hackney.sch.uk/  11-Mar    VOLUNTARY      NA      NA
## 10      http://www.marlborough.rbkc.sch.uk  11-Mar STATE-FUNDED      NA      NA
##    map_icon_l Primary                  geometry
## 1           2       0 POINT (512629.7 179975.9)
## 2           2       0   POINT (521936.5 188146)
## 3           2       0   POINT (535682.4 190165)
## 4           2       0 POINT (528429.4 181474.4)
## 5           2       1 POINT (525386.4 183935.4)
## 6           2       1 POINT (533441.4 184685.8)
## 7           2       1 POINT (533822.3 184028.1)
## 8           2       1 POINT (533377.3 185518.7)
## 9           2       1 POINT (533690.4 184403.3)
## 10          2       1 POINT (527412.5 178677.9)
# Let's do a quick bit of data munging with this dataset: 
# First, let's create a new variable that focuses in on just the four most common types of schools
schools_sf <- schools_sf |>
  transform(TYPE_REDUCED = ifelse(TYPE %in% c('Academy Converter', 'Community School', 'Other Independent School', 'Voluntary Aided School'), TYPE, NA))

# Quick check
table(schools_sf$TYPE_REDUCED)
## 
##        Academy Converter         Community School Other Independent School 
##                      417                     1248                      897 
##   Voluntary Aided School 
##                      589
# Second, let's fix an absolutely hilarious excel error in the data.
# Excel has taken the variable AGE (which gives the school age range) and interpreted it as a date.
# I.e., cases where the school is 11-19 years, are reported as 19-Nov in the data! Fortunately, we can reverse engineer the right data...

# First, find those cases where this has gone wrong by doing a string search for any of the months (month.abb is build into base R), and create a dummy variable:

# extract all second parts of the age string:
parts <- strsplit(schools_sf$AGE,"-")


schools_sf <- schools_sf |>
  transform(AGE_MAX = as.numeric(sapply(schools_sf$AGE, function(date_string) { # Extract the max age as the first item before the -
    date_parts <- strsplit(date_string, "-")
    return(date_parts[[1]][1])
  }))) |>
  transform(AGE_MIN = sapply(schools_sf$AGE, function(date_string) { # Extract the min age (which is often incorrectly reported as a month) as the item after the -
    date_parts <- strsplit(date_string, "-")
    return(date_parts[[1]][2])
  })) |>
  transform(AGE_CORRUPTED = ifelse(AGE_MIN %in% month.abb, 1, 0))

# Extract numeric values that are still present in AGE_MIN (there are some that excel didn't corrupt)
numeric_values <- schools_sf$AGE_MIN[!schools_sf$AGE_MIN %in% month.abb]

# Define a mapping between month names and their corresponding true numeric values
month_mapping <- c("Jan" = 1, "Feb" = 2, "Mar" = 3, "Apr" = 4, "May" = 5, "Jun" = 6,
                   "Jul" = 7, "Aug" = 8, "Sep" = 9, "Oct" = 10, "Nov" = 11, "Dec" = 12)

# Add the numeric values we extracted to the mapping
month_mapping <- c(month_mapping, setNames(numeric_values, as.character(numeric_values)))

# Use the mapping to convert month names and numeric values to final numeric values
schools_sf$AGE_MIN <- month_mapping[as.character(schools_sf$AGE_MIN)]

# Create age range variable, and remove excess white space:
schools_sf <- schools_sf |>
  transform(AGE_RANGE = ifelse(AGE_CORRUPTED == 1, paste0(AGE_MIN, "-", AGE_MAX), AGE)) |>
  transform(AGE_RANGE = gsub(" ", "", AGE_RANGE, fixed = TRUE))

# Quick check
table(schools_sf$AGE_RANGE)
## 
##  0-11  0-13   0-6   0-7  1-11  1-13  1-14  1-19   1-5   1-6   1-7 10-16 10-17 
##     2     2     1     1     1     2     2     2     1     1     1    10     2 
## 10-18 10-19 11-13 11-16 11-17 11-18 11-19 11-20 11-21 12-16 12-18 12-19 13-16 
##    10    14     1   135     6   367   102     1     1     3     3     1     6 
## 13-17 13-18 13-19 13-20 13-21 14-16 14-18 14-19 14-20 14-21 14-25 14-35 15-16 
##     2     7     3     1     1    11     4    22     1     3     1     1     1 
## 15-18 15-19 15-24  2-11  2-12  2-13  2-14  2-16  2-17  2-18  2-19   2-5   2-6 
##     1     3     1    94     4    36    10    10     2     8    28     1     9 
##   2-7   2-8  3-10  3-11  3-12  3-13  3-14  3-15  3-16  3-17  3-18  3-19  3-20 
##    19     2     3  1359    12    28    10     2    52     6    78    62     2 
##  3-25   3-5   3-6   3-7   3-8   3-9  4-11  4-12  4-13  4-16  4-18  4-19   4-7 
##     2     3     4   113     5     2   417     4    42    34    74    36     9 
##   4-8   4-9  5-10  5-11  5-12  5-13  5-14  5-16  5-18  5-19  5-21   5-7  6-13 
##     2     1     2   133     6     4     8    26    12    18     2    56     4 
##  6-14  6-15  6-16  7-11  7-13  7-14  7-16  7-17  7-18  7-19  8-13  8-14  8-16 
##     2     2     6   166    12    10    14     2    24    18     4     2     6 
##  8-18  8-19  9-16  9-18  9-19 
##     6    10     2     4     2

Static Visualisation of Schools

Let’s quickly make a static map of the schools. We can improve our previous mapping by adding a bit of jazz to our presentation…

# Plot schools data using ggplot2 
schools_plot_gg <- 
  ggplot() + 
    geom_sf(data = london_msoa_2011, fill = "transparent", color = "gray30") + # Set fill to transparent and color to black
    geom_sf(data = schools_sf, shape = 19, aes(color = TYPE_REDUCED), alpha = .5, size = 1) + 
    ggtitle("Schools Locations in London") + 
    coord_sf() +
    theme_void() + 
    theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) + 
    labs(color = "School Type")  # Relabel the legend

schools_plot_gg

# The same map using tmap
schools_plot_tm <-
  tm_shape(london_msoa_2011) +
    tm_borders(col = "gray30") +
    tm_shape(schools_sf) +
    tm_dots(shape = 19, col = "TYPE_REDUCED", title = "School Type", alpha = 0.5, scale = 2.5) +
    tm_layout(title = "Schools Locations in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.
## [v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
##   a list: 'size.scale = list(<scale1>, <scale2>, ...)'
## [tm_dots()] Argument `title` unknown.
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
schools_plot_tm

# We can always jazz up the above maps a little, for example, using tmap:
schools_plot_jazz_tm <- 
  tm_shape(london_msoa_2011) +
    tm_fill(alpha = 0.1, fill = "gray60") +
    tm_borders(col= 'white') +
  tm_shape(schools_sf) +
    tm_dots(shape = 19, col = "TYPE_REDUCED", title = "School Type", alpha = 0.5, scale = 2.5) +
  tm_layout(title = "Schools Locations in London",  
              title.color = 'white',
              legend.text.color = 'white',
              legend.title.color = 'white',
              inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE, bg.color = 'gray10')
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: use `fill_alpha` instead of `alpha`.[v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.[v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
##   a list: 'size.scale = list(<scale1>, <scale2>, ...)'[tm_dots()] Argument `title` unknown.[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
schools_plot_jazz_tm

Dynamic Visualisation of Schools

Finally, let’s make a dynamic/interactive map. This is going to allow us to actually move around the map, zoom in, zoom out, click on individual data points and see their data, etc.

# Let's create an interactive map with a basemap, again using tmap: 
# First, set the tmap_mode to viewing:
library(tmap)
library(sf)

tmap_mode('view')
## ℹ tmap mode set to "view".
# Second, create the interactive map:
schools_plot_interactive <-
  tm_shape(london_msoa_2011) +
  tm_borders() +
  tm_shape(schools_sf) +
  tm_dots(shape = 19, col = "TYPE_REDUCED", title = "School Type", alpha = 0.5, scale = 2.5, 
          id = "SCHOOL_NAM", # the scroll-over name
          popup.vars = c("AGE_RANGE", "GENDER", "STATUS", "WARD_NAME")) + # variables when you click -- note the hilarious excel error in AGE
  tm_layout(title = "Schools Locations in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE) +
  tm_basemap(c('OpenStreetMap'))
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.[v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
##   a list: 'size.scale = list(<scale1>, <scale2>, ...)'[tm_dots()] Argument `title` unknown.[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
schools_plot_interactive
## Registered S3 method overwritten by 'jsonify':
##   method     from    
##   print.json jsonlite
# We can save it as an .html file:
# You can also take a static image using mapshot()
tmap_save(schools_plot_interactive, "output/schools_plot_dynamic.html") 
## Interactive map saved to output/schools_plot_dynamic.html
# Revert to plotting mode
tmap_mode('plot')
## ℹ tmap mode set to "plot".